Open
Conversation
The Gibbs ensemble simulation would deadlock when both moltransrot and gibbs_matter moves were enabled. Root cause was RNG desynchronization between MPI processes. Problem: - MoveCollection::sample() uses mpi.random to select moves (must stay in sync) - GibbsMatterMove::_move() was using mpi.random for molecule selection - The two cells have different molecule counts (e.g., 5 vs 15) - std::uniform_int_distribution uses rejection sampling, which makes variable numbers of engine calls for different ranges - This caused mpi.random states to diverge between processes - Subsequent sample() calls returned different moves on each process - When one selected gibbs_matter (with MPI calls in bias()) and the other selected moltransrot, the first blocked waiting for MPI Fix: 1. Use Faunus::random (local, non-MPI) for molecule selection instead of mpi.random. This keeps mpi.random synchronized for move selection. 2. Add MPI exchange to synchronize early returns. When one process fails (no molecules found or cell full) while the other succeeds, both must skip the move to avoid one blocking in bias() waiting for MPI calls.
IVinterbladh
approved these changes
Jan 27, 2026
Collaborator
IVinterbladh
left a comment
There was a problem hiding this comment.
Tested and it works!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes MPI deadlock when both
moltransrotandgibbs_mattermoves are enabled in Gibbs ensemble simulations.Root cause: RNG desynchronization between MPI processes due to
std::uniform_int_distributionmaking variable numbers of engine calls for different ranges (rejection sampling). The two cells have different molecule counts, causingmpi.randomstates to diverge, which led to different move selections and eventual deadlock.Fix:
Faunus::random(local) for molecule selection instead ofmpi.randomTest plan